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STANDARD DEVIATION OF THE LONGEST 
COMMON SUBSEQUENCE 

By Juri Lember 1 and Heinrich Matzinger 

University of Tartu and Georgia Tech 

Let L„ be the length of the longest common subsequence of two 
independent i.i.d. sequences of Bernoulli variables of length n. We 
prove that the order of the standard deviation of L n is y/n, provided 
the parameter of the Bernoulli variables is small enough. This vali- 
dates Waterman's conjecture in this situation [Philos. Trans. R. Soc. 
Lond. Ser. B 344 (1994) 383-390]. The order conjectured by Chvatal 
and Sankoff [J. Appl. Probab. 12 (1975) 306-315], however, is differ- 
ent. 

1. Introduction. Throughout this paper X±,X2,--- and Yi,Y 2 , ■ ■ ■ are 
two independent sequence of i.i.d. Bernoulli variables with parameter 0.5 > 
£>0: 

£ = P{X, = 1) = P(Y = 1) = 1 - P(Xi = 0) = 1 - P(Yi = 0). 

Let X := X1X2 ■ ■ ■ X n and let Y := Y1Y2 ■ • ■ Y n . The longest common sub- 
sequence (LCS) of X and Y is any common subsequence that has the longest 
possible length. The length of LCS is denoted L n . Formally, L n is the biggest 
k such that there exists two subsets of indices {h, ■ ■ ■ ,ik},{ji, ■ ■ ■ ,jk} C 
{1, . . . , n} satisfying i± < i 2 < ■ ■ ■ < ik, ji < h < ■ • ■ < Jk and X h = Y h , X h = 
Yi 2 , . . . ,Xi k = Yi k . The main result of this paper is, that for e > small 
enough, the order of the standard deviation of L n is y/n. 

LCS's are a very important tool in computational biology, where they 
are used for comparing DNA- and protein-alignments (see, e.g., [3, 16, 17]). 
They are also used in computational linguistics, speech recognition and so 
on. In all these applications, two strings with a relatively long LCS, are 
deemed related. 
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Example. Let us give an example of the practical use of LCS's. Take 
the two words: X = fanthastic and Y = fntastique. These two words are 
very similar. They were obtained from the English word "fantastic" and the 
French word "fantastique" by adding spelling mistakes. We would like the 
computer to recognize the similarity. If the computer compares letter by 
letter, 



fanthasthastic 
fntastique 

it finds that only one letter coincides. Comparing the ith letter of the first 
word with the ith letter of the second word for all the letters is not a good 
way to recognize any similarity. The reason are the missing letters. The orig- 
inal position of the letters in the words gets changed. To take into account 
the missing letters or added letters, we align the two words allowing for 
gaps. We allow only same letters to be matched with each other. In such a 
way, we obtain a sequence of letters that is contained in X as well as in Y. 
Such a sequence is a common subsequence of X and Y. Hence, the longest 
common subsequence is the maximum number of same letters we can align 
allowing gaps. In our example the maximum is given by the alignment 



, lN fanthasti 
(1) 

/ n t a s t i 



Hence f,n,t,a,s,t,i is the longest common subsequence of the two words 
and the length of the longest common subsequence, L n , is 7. This indicates 
that the two words are very similar. 

To distinguish related pairs of strings from unrelated via the LCS method, 
we need to assess the order of the fluctuation of the LCS. For this reason 
the random variable L n has received a lot of attention. Nonetheless, many 
questions remain open. In their pioneering paper [7], Chvatal and Sankoff 
prove that the limit 

(2 7:= lim 

n— >oo fi 

exists. In [1], Alexander investigated the rate of the convergence in (2) and 
showed that for a constant C, EL n — 717 > CVn Inn. Moreover, by a subad- 
ditivity argument 

(3) — — > 7 a.s and in L\ 

n 

(see, e.g., [1, 17]). The constant 7 is called the Chvatal-Sankoff constant and 
its value is unknown for even as simple cases as i.i.d. Bernoulli sequences. 
In this case, the value of 7 obviously depends on the Bernoulli parameter 
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e. When e = 0.5, the various bounds indicate that 7 ~ 0.81 [4, 11, 14]. For 
a smaller e, 7 is even bigger. Further bounds on 7 have been obtained by 
Martinez, Hauser and Matzinger [9]. Hence, a common subsequence of two 
independent Bernoulli sequences typically makes up a large part of the total 
length. This implies that to make some inference, the size of the variance 
Var[L n ] is essential. Unfortunately, not much is known about Var[L n ] and 
its asymptotic order is one of the central open problems in string matching 
theory. Monte Carlo simulations lead Chvatal and Sankoff in [7] to conjecture 
for e = 0.5 that Var[L n ] = o(n 2 / 3 ). Using an Efron-Stein type of inequality, 
Steele [14] proved Var[L„] < 2e(l — e)?i. In [15], Waterman asks whether this 
linear bound can be improved. His simulations suggest that for £ <\ this is 
not the case and Var(L n ) grows linearly. In [6], Boutet de Monvel simulates 
Var(L n ) for the case £ = \ and notices the linear growth as well. However, 
he adds that the linear regime of the growth is not reached before n is about 
10,000. He also simulates the values of the random variable 

L n — EL n 

V / Var(L n ) 

and founds its distribution close to normal. 

In a series of papers, we investigate the asymptotic behavior of Var[L n ] in 
various setups. Our goal is to find out, whether there exists a constant c > 
(not depending on n) such that Var[L n ] > cn. Together with Steele's bound, 
this means that cn < Var[L n ] < n, that is, Var[L n ] = 0(ra) [a sequence a n 
is of order 0(n), if, for some constants < c < C < 00, cn < a n < Cn for 
all n large enough]. In [5], Bonetto and Matzinger consider the asymmet- 
ric situation where the random variables in X are Bernoulli with 1/2, but 
Y is a random i.i.d. string with three symbols. They obtain that in this 
setting Var[L n ] = 0(n). In [10], Houdre, Lember and Matzinger investigate 
the asymptotic behavior of the longest common increasing subsequence of 
two independent Bernoulli sequences (a binary increasing sequence begins 
with a block of zero's followed by a block of one's). They find that under 
this additional restriction n~ l / 2 (L n — EL n ) converges in law to a functional 
of two Brownian motions implying that Var[L n ] = 0(n) holds again (here 
L n designates the length of the longest common increasing subsequence). 
Durringer, Lember and Matzinger [13] show that Var[L„] = 0(n) when Y 
is a nonrandom periodic binary sequence and X is an i.i.d. Bernoulli 1/2 
sequence. The nature of the optimal path has been investigated by Amsalu, 
Popov and Matzinger in [2] as well as by Lember, Matzinger and Vollmer in 
[12]. 

The relatively long history shows that determining the exact order of 
the fluctuation of L n is a difficult problem. In fact, as noted in [1, 3], the 
LCS problem can be reformulated as a Last Passage Percolation (LPP) 
problem with correlated weights. But for standard LPP and First Passage 
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Percolation, the question of the exact order of the fluctuation remain open 
except for the case of geometric or exponential weights which has been solved 
by Johanson. 

2. Main result. The main result of this paper, Theorem 2.1, asserts that 
when e > is small, the fluctuation of L n is of order y/n. In fact, the theorem 
gives only a lower linear bound for the variance of L n . The upper linear 
bound comes from the result of Steele [14]. Hence, Theorem 2.1 implies that 
Var[L n ] = 9(n). 

Theorem 2.1. There exists e$ > such that for every e < £q, there 
exists a constant c > depending on e but not depending on n, that satisfies 



One of the main tools in this paper is a map that picks a one in the text 
X or Y at random and changes it into a zero. Let X and Y designate the 
texts obtained in this way. 

Example. Let n = 6, X = 001000 and Y = 101000. The total number of 
ones in the two texts is 3. Hence, we pick one of these three ones at random 
with equal probability and switch it into a zero. Assume we pick the second 
one in text Y. Then X = 001000 and Y = 100000. 

Let us define X and Y rigorously. For a binary string x = x\X2 • ■ • x n , we 
denote by iVf the total number of ones in x. So Nf := Ya=i x i- Similarly, 
Nf is the total number of ones in y = yiyy ■ • • y n - The binary random strings 
X and Y are defined by the following equations: 



Var[L n ] > c • n 



Vn. 




i=l 
else, 




n 




1=1 

else, 





else 



0; 




else. 



0: 
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Let L n denote length of the longest common subsequence of X and Y . When 
we change one bit in X or Y and flip it to the opposite value, then the length 
of the LCS changes by at most one. The next theorem shows that in this 
case the length of the LCS L n is more likely to increase by one unit than to 
decrease by one unit. 

Theorem 2.2. There exist constants a,\ and cti > a 2 and a set 
B n C {0, l} n x {0, l} n such that for all (x,y) G B n 

(4) P(L-L = l\X = x,Y = y) >a u 

(5) P(L- L = -1\X = x,Y = y) <a 2 . 
Moreover, there exists an 8q > such that for every < e < e Q 

(6) P((X,Y)£B n )>l-e-^ n , 
where c\ > does not depend on n, but may depend on e. 

In Section 3, we prove that Theorem 2.2 implies Theorem 2.1. Let us 
briefly explain the main ideas behind the proof. We define two sequences 
of random binary strings X 1 , X 2 , . . . , X 2n and Y 1 , Y 2 , . . . , Y 2n , all of them 
having length n. The strings X k and Y k are define by induction on k: X 2n 
and Y 2n consist only of ones; X^ 1 and Y k ~ x are obtained by choosing a one 
at random in X k Y k and replacing it by a zero. Hence we use the random 
map ~. We designate by L(k) the length of the LCS of X k and Y k . Note 
that the total number of ones in the string X k and Y k is k. Let (X,Y) be 
independent of {(X k ,Y k )} k£ ^ ^n} an d let Ni designate the total number 
of ones in the two strings X and Y. It is not hard to see that (X k ,Y k ) has 
the same distribution as (X,Y) conditional on N± = k. This implies that 
L(iVi) has same distribution, as L n . The standard deviation of N\ is of 
order -y/n. Moreover, from Theorem 2.2 directly follows that the (random) 
map k 1 — > L{k) tends to increase linearly on a certain scale. These two facts 
together imply immediately that the standard deviation of L{N\) and hence 
also of L n is of order ^/n. 

Let us now give a heuristic argument why Theorem 2.2 holds. Recall that 
in this paper, we consider the situation where one has a small, but fixed 
probability. Hence, in the texts X and Y, there is a small proportions of 
ones. This implies that only a small percentage of ones can figure in a LCS. 
It will turn out that the number of ones in a LCS is typically of order e 2 n. 
This is much less than the total number of ones in the texts X and Y, which 
is of order 2en. It follows that the majority of ones in the texts X and Y 
constitute a "net loss" for the score L n . Hence the number of ones tends to 
influence the score L n negatively. Changing a randomly picked one into zero 
is not very likely to decrease the score. It can decrease the score only if the 
chosen one is used in a LCS. But the additional zero obtained in this way 
will in many cases increase the score. 
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Example. Let 

X = 00010000100000000000001, 

Y = 00010000000010000100000. 

The longest common subsequence Z is Z = 000100000000000000000. An 
alignment corresponding to Z is 

X0001000010000 1 
F00010000 000010000100000 
Z00010000 0000 0000 00000 

The optimal solution is obtained by matching all the zeros, and the first one 
in both texts, but discarding all other ones. We see the general phenomena: 
since there are few ones, sometimes by chance some ones appear in respective 
positions in the two texts where they can be matched. The other ones in 
text X and Y appear in places in the text where we cannot match them 
with a one. If we would match them we would loose too many zeros. That 
is why, most ones can not be used in the LCS. 

The argument in the previous numerical example gives a first idea of what 
is happening. However, proving anything rigorously is difficult. The reason 
is as follows. We take e small but fixed and let then n tend to infinity. 
The optimal alignment (optimal alignment is the alignment which defines 
the LCS) is then going to be a global alignment. This means that typically 
some parts of the text X will be connected with parts of the text Y that 
are far away. This introduces complicated correlations between the different 
parts of the optimal alignment. Microscopically it is easy to understand the 
approximate behavior of the optimal alignment. Macroscopically however, 
little is understood about the optimal alignment. It seems that there are 
complicated long range interactions between all the different parts. 

3. Theorem 2.2 implies Theorem 2.1. The proof. In this section, we 
prove that Theorem 2.2 implies Theorem 2.1. We use some of the techniques 
developed in [5]. 

Recall that N\ is the total number of ones in the two strings X and Y . 
We already mentioned briefly the definition of the random pair of strings 
(X k ,Y k ) for k G [0, 2n]. Let us give more details. Both strings X k and Y k 
are binary strings of length n. We proceed recursively on k. The strings X 2n 
and Y 2n consist only of l's. We pick a 1 in the strings X 2n Y 2n at random 
and change it into a 0. This way we obtain (X 2n ~ l ,Y 2n ~ 1 ) . For general k, 
we obtain (X k ~ 1 ,Y k ~ 1 ) from (X k ,Y k ) by choosing a 1 at random in X k Y k 
and changing it to the opposite value. Each one has the same probability to 
get chosen. We request that conditional on (X k ,Y k ), which one in (X k ,Y k ) 
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gets chosen, is independent of {(X 1 , ^ l )}ie[fc,2n]- I n other words, we apply 
the transformation ~ so that 

X k ~ 1 :=X k and Y k ~ l :=Y k . 

The distribution of (X k , Y k ) is equal to the distribution of (X, Y) conditional 
on N\ = k: 

(7) C(X k ,Y k )=C(X,Y\N 1 = k), 

where C(W) designates the distribution of the random variable W. 

Let L(k) designate the length of the LCS of X k and Y k . 

We assume that {X k , Y k }ke[o,2ri\ are independent of the random vari- 
able N\. Picking N± according to its distribution gives us random strings 
(X Nl ,Y Nl ) that have the same distribution as (X,Y). Therefore, the length 
L(N\) of the LCS of (X Nl ,Y Nl ), has the same distribution as L n . Hence 

Var[L n ]=Var[L(AT 1 )]. 

Recall that our aim is to prove that Var[L(iVi)] it at least of order n. This 
follows from two facts: (1) the order of Var[iVi] is n; (2) the (random) map 
k i — > L(k) typically decreases linearly on a certain scale. 

The second point follows rather directly from Theorem 2.2 and is proven in 
Lemma 3.2. This section is dedicated to showing that (1) and (2) above imply 
the linear lower bound for Var[L(iVi)]. There are two technical difficulties: 
(a) the map k i— > L(k) does not increase at every point, but only on a certain 
scale; (b) the increasing slope on a certain scale only holds in a domain where 
typically N\ takes values, but not everywhere. 

Recall that for any variables V and W, 

(8) Var[V] = Vav[E[V\W}] + £[Var[F|t^]] > E[Vai[V\W}], 

where Var[y|iy] is the variance of the conditional distribution £(V|W). 
Applying (8) to our case, we find 

(9) Vax[L(Ni)] > J B[Var[L(iV 1 )|L(-)]], 

where L(-) is the (random) map k t— > L(k). Note that N\ is independent of 
L(-). 

Let / be the interval 

(10) I := [2en - ^e(l - e)2n, 2en + y/e(l - e)2n]. 

Let N\ be a random variable, independent of L(-) and having the distribution 
of N± conditioned on N\ € /. From (8), it follows for every fixed L that 

Var[L(iVi)] > Var[L(JVi)|iVi G T\P(Ni G /) = Var[L(A>i)]P(iVi £ I). 
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Hence, since L and N% are independent, 

(11) E[Vax[L(Nx)\L(-)]] > £[Var[L(JVi)|L(.)]]P(iVi G /). 

Assume that / :R — > IR is map such that, for a constant c > 0, f'(x) > c 
for all x E M. Then, for any random variable Y, we have 

(12) Var[/(F)] >c 2 Var[Y]. 

(See Lemma 3.2 in [5] for the proof.) Hence, if the map L(-) had positive slope 
everywhere larger than c> 0, it would follow that Var[L(iV"i)] > c • Var[iVx]. 
Typically, the (random) map k i— > I/(fc) does not strictly increase for every 
k G [0,2n]. But it is likely that in / it increases by a linear quantity. We are 
next going to formulate a lemma, proven in [5] (Lemma 3.3 in [5]), which 
is a modification of inequality (12), for when the map k i— > f(k) does not 
increase every k, but has a tendency to increase on some scale. 

Lemma 3.1. Let c,m>0 be two constants. Let f:I^7L be a non de- 
creasing map that satisfies the following conditions 

(13) f(j)-f(i)<(j-i) Vi<j, 

(14) f(j) — f(i) > c ■ (j — i) Vi, j suc/i that i + m<j. 

Let B be an I -valued random variable such that E\f(B)\ < oo. Then 

(15) Va r[ /( B )]>4-^§=)va r[B ]. 

Recall the definition of / in (10). Let a± and «2 be the constants from 
Theorem 2.2 and let -Egjope denote the event that for all i,j £ L, such that 
i + re 0,1 < j, we have 

(16) L(j)-L{i)>a 3 \i-j\, 
where 

«i — «2 

«3 := . 

2 

In other words, the event E™ lopc says that L(-) has a slope of at least 03 on 
/, when we look only at points which are at least n away from each other. 
The next lemma shows that the event E™ lope has high probability, provided 
Theorem 2.2 holds. 

Lemma 3.2. For a constant C4 > 0, 

(17) P^lope)^!-^ 01 , 

provided n is sufficiently big. 
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Proof. Let A k denote the event that the random vector (X k , Y k ) takes 
the values in the set B n defined in Theorem 2.2. So 

A k :={(X k ,Y k )cB n }. 

Let A^ x be the event 

kel 

Let 

L(k - 1) - L(k), when A k holds; 
1, else. 

Let i < j and consider the random variable 

t ^ 

k=i+l 

When (X k ,Y k ) = (x,y) £ B k , that is, A k holds, then Theorem 2.2 says that 

P{A k = l\X k = x,Y k = y)> ai , 

P(A k = -l\X k = x,Y k = y) <a 2 , 

implying that E[A k \A k ,X k ,Y k ] > a x -a 2 . Since E[A k \{A k ) c ] = 1 >a 1 -o 2 , 
we get 

(18) E(A k \X k ,Y k )> ai -a 2 . 
Let, for every k = 2n + 1, . . . , 2, 

jF fe : = a(X 2n , y 2n , . . . , A-*- 1 ^*- 1 ). 
These a-algebras perform a (reversed) filtration, because 

•7"2ra+l C ^2n C ■ • ■ C T 2 . 

The random variable A& is ^-measurable. Hence, Vk ■= Afc — i£[Afc| .Ffc+i] 
are reversed martingale differences. Since — 1 < A& < 1, we can use Hoffding- 
Azuma's inequality to obtain 



(19) P ]T A k - J] £[A fc |.F fc+1 ]<- C <exp 

\fc=i+l k=i+l ) 

The inequality (18) means 

£[A fe |.F fc+1 ] > ai - a 2 

implying that 

3 

(20) J2 E[A k \F k+1 }>( ai -a 2 )(j-i). 

k=i+l 



2c 2 



4(j - i) 
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With c = (^^)(j - i), (19) and (20) yield 



j 



p(^E i A fe <(^)(i-^) 



<p( £ A*- f E[A^ k+1 ]<-{^-)(j-i) 



\k=i+l k=i+l 

<exp[-a(j -i)], 
where a = \{^^) 2 - So 

(21) p( ]T A fc <a 3 (j-i)) <exp[-a(j-z)]. 

\fc=i+l / 

Let slope b e the even t t n& t Vi, j £ /, such that 2en < i < j < 2en + y^ra 
and i + n ' 1 < j, we have 

3 

(22) 2A*>a 3 |i-j|. 

k=i 

By (21), for n large enough, there exists a constant C2 > such that 
P((^Aslopc) C ) < nexp[-(a)n ai ] < exp[-c 2 • n ai ], 

and hence 



(23) P(^Asio P c)>l-e- C2ri 



o.i 



When the event Afj 1 holds, -EgJ ope and E% slope are equivalent. Hence 
which implies 



a all n rpn f— j?n 

1 1 "A slope ^- ^slope) 



(24) P(£^ pe ) < P((Af) C ) + P(^ C slope)- 

Note that 

(25) P((Aff) < J2 P(A k n c ) = J2 P(A c n \m = k)<J2 p^Sk) ' 

fee/ feel fee/ ^ 1 ' 

where 

(26) A n :={(X,Y)eB n }. 

By the local central limit theorem, there exists C3 > such that for all k £ I 

P(N 1 = k)> 1/C3 



n 
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Applying the last inequality to (25), yields 

(27) P{(Af) c ) < V2nc 3 P(A c n ). 
Now the inequalities (23), (27) and (24) yield 

(28) P(EZ pe ) < V2nc 3 P(A c n ) + e—-" . 

By Theorem 2.2, we have that P(A^) < Ce~ Cin . Applying this to (28) gives 

P(EZ pc )<c 3 V2ne- c ^ + e-^ n °\ 
which finishes the proof. □ 

When -E2 p e holds, then the map 

satisfies the conditions of Lemma 3.1 with m = n ' 1 . Hence, when 
holds, then 

Var[L(iVi)] >a 2 3 (l ^ = ) Var[JVi]. 

v a 3 ^Var[NiY 

Conditioning on , using the fact that the variance is nonnegative and 

N\ and L are independent, 

E|Var[L(#i)]|L(.)] > i?[Var[L(iV 1 )| J B s l opc ]]P(^ opc ) 

2 ^ 2n ai 



>a 2 3 [l =_ ) Var[iV 1 ]P(£; s " lope ) 

a 3 VVar[JVi]- 



Plugging the last inequality into (11) yields 
£[Var[L(iVi)|L(-)]] 

(29) 

2n ai 



><4(1 === ) Var[iV 1 ]P( J E s l ope )P(Ar 1 G /). 

a 3 JVav[Ni] 



By the central limit theorem, P(Ni G /) converges to 

P(AA(0,1)G [-1,1]) >0 

as n — ► oo. [Here jV(0, 1) designate the standard normal variable.] 

Note that N\ is a binomial variable with parameters 2n and e. Hence, by 
the central limit theorem, 

Var[Aq] Var[JVi|JViGJ] 



n n 



2e(l — e)P(M(0, 1) G [-1, 1]) 1 J <p(x)x 2 dx, 
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where <f> is the standard normal density. Together with Lemma 3.2, this 
implies that the right-hand side of inequality (29) divided by n converges to 

a§2e(l-e) J (/)(x)x 2 dx > 0. 

The inequality (9) now finishes the proof. 

4. Aligning the ones. The rest of the paper is devoted to the proof of 
Theorem 2.2. The key ingredient for the following is the notation to describe 
the alignments. Throughout this paper we only consider alignments which 
align a symbol with a gap or with the same symbol in the other text. We 
exclude alignments which align different symbols with each other. We start 
with a simple example. 

Example. Take the two texts X = 1000001 and Y = 1001. The LCS of 
X and Y is Z = 1001. It is obtained by aligning the first one in both text 
and the last one and for the rest aligning as many zeros as possible. Text X 
contains 5 zeros and text Y contains 2. The maximum number of aligned 
zeros is thus min{2, 5} = 2. There are many alignments corresponding to the 
LCS Z = 1001. Let us present two alignments corresponding to this LCS: 

~x I o~o o~~6 o r 

Y 1 1_ 

or another possibility: 

~Z 1 6~~6 6~~6 T~ 

Y 1 1 " 

How the zeros are aligned between the ones is not important as long as we 
align the maximum number of zeros between the ones. Hence in general we 
will only describe which ones are aligned and assume that between pairs of 
aligned ones we align the maximum number of zeros. Let us give a further 
example to illustrate this. Take the sequences: 

X = 101010101, 
Y = 11010001. 

A LCS of X and Y is 1101001. This LCS can be obtained with the following 
alignment: 

, v ~A I 1 I 1 r 

' Y 1 1 1 1 " 

We call the portions between pairs of aligned ones cell. 
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The first cell of alignment (30) is 

l " 

The first cell is an exception. It is the only cell which is not comprised 
between two pairs of aligned ones. Instead it consists of the first pair of 
aligned ones and everything to its left. We only introduce this special cell in 
order to simplify notations later on. 
The second cell of alignment (30) is 

~o r 
i_ 

The third cell of alignment (30) is 

~o r 

_o l_ 

The fourth cell of alignment (30) is 

~o i o r 

1 ' 

Note that the second cell has one more zero in the X-part than in the Y- 
part. The third cell has the same amount of zeros in both parts. The fourth 
cell has two zeros in the X-part and three zeros in the y-part. Hence the 
X-part has one zero less. The difference of zeros between the X-part and 
the K-part for cell 2, 3 and 4 in this order is 1, and —1. Cell number 1 has 
no zeros. Hence the difference of zeros for cell number 1 is equal to zero. Let 
V{ denote the difference of zeros of cell i. We will represent alignments as the 
sequence of differences of zeros of their cells. For the alignment (30), this 
gives the representation (vi,V2,vs, v&) = (0, 1, 0, —1). This sequence uniquely 
defines the alignment of the ones. 

Let X = Xi • • • X n and Y = Y± ■ ■ ■ Y n be given. As explained above, to 
every optimal alignment corresponds a vector v := (vi,...,Vk) that shows 
the number of cells in the alignment (fc) and the difference of zeros in the 
cells. In every cell, the maximum amount of zeros is aligned. On the other 
hand, to every vector v = (v±, . . . ,Vk) £ Z fc corresponds a (possible empty) 
family of alignments. All of them have the same pairs of aligned ones and 
between consecutive pairs of aligned ones, the maximum number of zeros is 
aligned. The alignments corresponding to v can differ only in the way the 
zeros between aligned ones (inside cells) are aligned. Since all the alignments 
associated with v have the same score (the same number of aligned zeros and 
ones), we do not care how the zeros inside a cell are aligned (as long as the 
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maximal number of them is aligned). Therefore, in a slight imprecision we 
will speak of one alignment for the whole family associated with v. In other 
words, we identify each vector v with an alignment. In this alignment, the 
number of aligned ones (cells) is k, the difference in the number of zero's in 
cell number % is Vi and inside a cell, the maximal number of zeros is aligned. 
So, in a sense, it is the "smallest" alignment which aligns exactly k pairs of 
ones with each other and has the difference of zeros in cell i equal to t>j, for 
all i G {l,2,...,/c}. 

We write \v\ for the length of v . If v E M. k , then \v\ = k. Let us next define 
rigorously the alignment associated with v = (v\, . . . ,Vk) € 

Definition 4.1. Let fceN and let v = (v±, . . . , Vk) G Define n(i),v(i) 
by induction on i: 

• start with vr(0) = u(0) = 0; 

• for i < k, once n(i),v(i) is defined, let + + 1)) be the smallest 
(s,t) such that all of the following three conditions are satisfied. 

1. ir(i) < s and u(i) < t; 

2. X s = Y t = l; 

3. the difference between the number of zeros of X in the interval [7r(z), s] 
and the number of zeros of Y in the interval [z/(i),t] is equal to Uj+i- 
Hence, 

v i+1 :=((s-7r(i))- xA-((t-u(t))- ^ Yj 

If no such (s, i) exists, then 7r(i + 1) = • • • = 7r(fc) := oo and ^(i + 1) = • • • = 
:= oo. 

The cell number i is equal to the pair of strings: 

C(i) := (PQr(i-l)+l! • • • , Xn-(i)), ■ • . 

We define the alignment f as any alignment such that the following condi- 
tions hold (provided that there exists at least one): 

• X n u\ is aligned with Y v u^ for every i = 1, . . . , k; 

• the number of aligned zeros in the cell C(i), denoted by S v (i), is the mini- 
mum between the number of zeros in the string X 7r (j_ 1 ) +1 X 7r (j_ 1 ) +2 • • • X T ^ 
and the number of zeros in the string 50j-i)+iY^(i-i)+2 • • ■ Y v u\\ 

• after aligning X^^) with Y v qa, we align as many zeros as possible. Let 
that number be r. 
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Hence, the number of aligned zeros up to the last pair of aligned ones 
equal to 

C tW 

S(i) := min (vr(i) - n(i - 1)) - J2 Xj, 
{ j=7r(i— 1)4-1 

u(i) \ 
(„(<)- „(i - 1)) - J2 Y '\- 

j=v(i-l)+\ ) 

To show that all ir(i), C(i), S(i) depend on v, we write also 

ir v (i) :=7r(i), := C v (i) := C(i), S v (i) := S(i), 

r v := r. 

We call a cell C v (i) a u-cell, if = u. Thus, in a 0-cell the number of zeros 
in X-part equals the number of zeros in y-part, and all the zeros in the cell 
are aligned. Similarly, in a 2-cell, there are 2 more zeros in the X-part, and 
these 2 zeros remain unaligned. 

To summarize: every jjGZ* defines an alignment. This alignment cor- 
responds to aligning X^ v u\ with Y Vv u\, for each i = 1, 2, . . . , k. These are 
the aligned pairs of ones: X^ v u\ = Y Vv u\ = 1. Between the aligned pairs of 
ones we assume that we align as many zeros as possible. Hence in cell num- 
ber i, we align S v (i) zeros (maximum possible amount). After last pair of 
aligned ones, we align as many zeros as possible. The length of the common 
subsequence defined by alignment v can now be computed as follows: 

Each cell gives one aligned pair of ones. Hence, this part contributes \v\. 
Then we add for each cell the number of zeros aligned. This sums up to 
YhIi S v (i). Finally we need to add the remaining amount of zeros r v which 
can be aligned but which come after the last cell. When v £ Z fe is such that 
ir v (k), v v (h) < n, then r v is the minimum between the number of zeros in 
the string X^nA ■ ■ ■ X n and the number of zeros in the string Y Uv nA ■ ■ - Y n . 
The length of the common subsequence defined by the alignment v is now 
equal to 

V 

S v := \v\ + ^2S v (i) +r v . 

i=l 

The number S v is also called the score of the alignment v. This is the length 
of the common subsequence corresponding to v. 

Of course, it can be that given X = X\ ■ ■ ■ X n and Y = Y\ ■ ■ ■ Y n there 
might not be any alignment corresponding to v. In this case 7r(fe) = v(k) = 
co. On the other hand, if an alignment corresponding to v exists, then 
ftv{k) < n and u v {k) < n. A vector v S Z fc satisfying the previous condition 
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is called admissible. Let V designate the set of all admissible alignments, 
that is, 

(31) V:=Le \JZ k :n(\v\),v(\v\)<n\. 

I k>0 ) 

The set V, obviously, depends on X and Y. The next statement trivially 
holds. 

Proposition 4.1. 

I "I 



(32) L n = max \v\ 



i=i J 



We say an admissible alignment v is optimal if S v — L n . 

Let v G lJ fc>0 Z fc be nonrandom and define \v\ random cells C v (l), . . . ,C v (\v\) 
as in Definition 4.1. One of the main advantages of defining alignments the 
way described above is that the cells C V (1),C V (2), . . . ,C v (\v\) are indepen- 
dent so that we can use large deviation techniques. If = Vj = u, then, in 
addition to being independent, the cells C v (i) and C V (J) are both identi- 
cally distributed u-cells. In Section 6.1, we show how to efficiently construct 
a it-cell. 

5. The effect of changing a one into a zero. 

5.1. The events B n and A n . Recall the main idea behind Theorem 2.2: 
typically, when changing a randomly picked one into a zero, the score L n is 
likelier to increase than to decrease. More precisely, we want the conditional 
probability of an increase in score to be above a±, while the conditional 
probability of a decrease should be below a-i- The constants ol\ and a-i, do 
not depend on n and satisfy a\ > a-i- By "conditional," we mean conditional 
on X and Y. 

Example. Take the two texts X = 0001000001 and Y = 1000010101. 
An optimal alignment is given by 

~x o 6 o I o o~~o o~~o T 

F100001010 1_" 
The first cell in this alignment is 



1 
1 1 
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while the second cell is 

~o o~o o o r 

1 1_ 

Assume that the one which we switch into a zero is Yg . This is a "nonaligned" 
one contained in the y-part of cell number two. By switching Y$ into a zero 
the LCS increases by one unit. The reason is that in cell number two, we can 
now align three zeros instead of only two. The new cell number two (after 
switching Yg) looks as follows: 

1_ 

The score gets increased because Y$ is on the side of the cell with strictly 
less zeros. We say that Y$ is on the side of a cell with less zeros. Let us 
imagine next that instead of Y% the one chosen would be X\q. This one is 
"used" in the alignment and hence switching it could result (and does in this 
case) in decreasing the optimal score L n by one unit. (This is not always 
necessary though, as can be seen with When we flip into a zero, the 
score remains the same.) We call the ones which are "used" in the alignment, 
ones that are matched by the alignment. In our example, X4 is matched with 
Yq and Xiq is matched with Y±q, Yg is not matched, nor is Y\. 

In the present situation, we have six ones. Each one has a probability to 
get picked of 1/6. Only Y\ and Y$ increase the score when picked. (Here Y\ 
is a nonmatched one on a side with more zeros. In general, such a one must 
not increase the score when changed into a zero. It does in this example 
by completely modifying the alignment and changing the number of cells.) 
Hence the probability of an increase in score is equal to 2/6. Four ones, X4, 
X\q, Yq and Y\q could potentially decrease the score. In our example only 
X10 actually does, so the conditional probability of a decrease is 1/6. Since, 
in general, with longer sequences we cannot look in detail at every one, we 
will use as upper-bound for the probability of a decrease: the proportions 
of matched ones to total number of ones. In our case, this gives 4/6 as 
upper bound for the probability of a decrease in score. As lower bound for 
the probability of an increase, we take the proportion of unmatched ones 
on sides with less zeros to the total number of ones. In our example, this 
proportion is equal to 2/6. 

From our example, it becomes clear what we need to do. We need to prove 
that typically there exists an optimal alignment v for which: 

(1) The proportion of ones that are on a side of a cell with less zeros 
among all ones in X and Y is above a\. 

(2) The proportion of ones that are matched among all ones in X and 
Y, is below 012- 
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In other words, we need to show that there exists an optimal alignment, 
with much less aligned ones than ones that are on a side of a cell with less 
zeros. 

Let N~(i) denote the number of ones on the side with less zeros in cell 
number i. Formally, let k £ N and let v = (vi, . . . ,Vk) S 1* k be admissible. 
For i G [0, k], we define 

'0, if Vi = (there is no side with less zeros); 

i>(i+l)-l 

Yj, if Vi > (Y part has less zeros); 

^(0 : = { i="(0+i 

Xj , if Vi < (X part has less zeros) . 

, i=7r(i)+l 

The total number of ones on sides with less zeros is 

V 

N-:=J^N-(i). 

i=l 

It is important to note that N~(i) counts the ones inside the cell, that is, the 
aligned 1 that ends every cell is not counted. This means that N~(i) can also 
be zero. [In the example above, N~(l) = 0, N~(2) = 1 and N~ = 1.] Such a 
definition ensures that N~ > a±Ni guarantees P(L — L = 1\X, Y) > a±. 

Fix some constants cti,ct2- Let A n be the event that there exists an opti- 
mal alignment v such that 

1. The proportions of ones on sides with less zeros is above a\. Hence, 
N~>a x N x . 

2. The proportion of aligned ones is below «2 : 2|u| < CK2N1, where N± is the 
total number of ones in X and Y. 

Obviously, A n depends on the values of ot\ and 0,2- From what we ex- 
plained it follows directly that on A n , the desired inequalities hold: 

P(L- L=l\X,Y)>a 1 and P(L - L = -1\X, Y) < a 2 . 

What is left to prove is that there exists ot\ > a 2 > such that the event A n 
has probability close to one: 

(33) P(A n ) > 1 — exp[— c\n], where c\ > 0. 

To be consistent with the notation in Theorem 2.2, let B n designate the set 
of pairs of strings (x, y) for which A n holds. Hence, (x, y) G B n if and only if 

{X = x,Y = y}eA n . 

We have A n := {(X,Y) £ B n } and for (x,y) £ B n inequalities (4) and (5) 
hold. 
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5.2. Breaking cells. In the previous section we argued that we need an 
optimal alignment with enough ones in cell-sides with less zeros (0-cells). 
The problem is that many optimal alignments can have most cells with the 
same number of zeros on both sides. For such alignments there will also 
be few ones on cell-sides with less zeros. This problem is circumvented by 
taking an optimal alignment with most cells having same number of zeros 
on both sides and applying some surgery, so as to create enough cells with 
different numbers of zeros on the sides. This is done in such a manner that 
the "patient" after operation is still an optimal alignment. Let us first look 
at an example. 

Example. Take the texts X = 01001001001001 and Y = 01010000010101. 
Take the following optimal alignment 

~H 1 1 1 1 T 

roioio oo ooioioi' 



The first cell is 



1 
1 ' 



The second cell is 



1 1 1 
1 1 



The third cell is 



1 
10 1 ' 

All cells in the above alignment have the same number of zeros. Thus, there 
are no sides with less zeros and N~ = 0. Now there is a way to remedy 
this problem. Take cell number two. There are two ones which are "quasi" 
aligned: and I4. These two ones are only one position away from being 
aligned. So, if we align them, instead of the pair of zeros X4 and Y5, the 
score remains the same. When we align the pair of ones X§ and I4, we split 
cell number two into two cells. This is how cell number two looks after this 
transformation: 

I o~~o I o~~o r 

1 1 ' 

Instead of the old cell number two, we observe the new cell number two 
followed by the new cell number three. The old cell number three does not 



20 



J. LEMBER AND H. MATZINGER 



change but is renamed and becomes cell number 4. The new cell number 
two is equal to 

_0 1_ 

The new cell number three is 

o o I o~~o r 

1 " 

The advantage of breaking up a cell is that the new cells have different 
number of zeros on each side. Hence, N~ tends to increases in the process 
while the score remains the same. In our example, after breaking the cells, 
the number of ones on sides with less zeros is 1, since the new cell number 
three has a one on a side of less zeros. Changing this one into a zero will 
increase the score. The breaking up process helps up get rid of the problem 
of having too many cells with the same number of zeros on both sides. Note 
that the breaking up the cell does not necessary increase the number N~: 
although after breaking a cell, both new cells have different number of zeros, 
it might happen that both of them have no ones on the side of less zeros. 
In this case, the number TV - does not increase. However, once we have an 
optimal alignment with enough nonzero cells, the probability is high to also 
find enough ones on sides with less zeros. 

Let us define what we saw in the previous numerical example in a precise 
fashion. 

Definition 5.1. Let k £ N, v £ Z k n V, i < k and V{ = 0. We say that 
cell i of v can be broken up if there exists j and f satisfying all of the 
following: 

1. X j =Y j , = l. 

2. n(i) <j< 7r(i + 1) and v(i) < f < v(i + 1). 

3. The difference between the number of zeros in the strings 

^7r(i)+i^r(i)+2 • • • Xj-i and y i/ (j)+i^ l /(i)+2 • • • Yj'-i 
is one or minus one. Hence 




STANDARD DEVIATION OF LCS 



21 



5.3. Optimal alignment contained in V n . Recall that Y{ and Xi are i.i.d. 
Bernoulli random variables with parameter e. In Section 6, we will show that 
with high probability L n is larger by 0.1e 2 n than half of the total amount 
of zeros in X or in y. Let us briefly explain the use of this fact. When 

L n > — + a, 

where Nq is the total number of zeros in X and Y and a > 0, there are two 
immediate consequences: 

(1) In any optimal alignment v there need to be at least a pairs of 
aligned ones. Hence, any optimal alignment v needs to be contained in the 

set Ufc> Z fc - 

(2) Any optimal alignment v in satisfies 

k 

(34) ^2\vi\<2k. 

i=i 

Otherwise the unmatched zeros (at least Ya=i \ v i\) wou ld out-number the 
aligned ones (the number of aligned ones is 2k) bringing the score below an 
alignment with only zeros aligned. Indeed, the number of nonaligned zeros 
in the alignment v is at least J2i=i \ v i\: so the number of aligned zeros is at 
most 

No-Ei=i\vi\ 



2 

and (34) follows from the inequalities 

^<L„< iV °-^ ""'+*. 
2 2 

When we take 0.1e 2 n for a, conditions (1) and (2) can be expressed by saying 
that any optimal alignment v is necessarily contained in the set V n , where 

(35) V n := |J V(k), 

k>0.1e 2 n 

and V(k) C Z fc is defined as follows 

(36) V(k) := {{v 1 ,v 2 ,...,v k )eZ\\v 1 \ + --- + \v k \ < 2k}. 

The fact that any optimal alignment is typically contained in V n is very 
useful. The set V n is relatively small [see the bound (50)]. So, whenever 
we want to prove the likeliness of a property for the optimal alignment, 
we prove the property to hold typically for every alignment in V n . The 
tremendous advantage of this approach is that for every (nonrandom) v £ 
V n , the alignment associated with v has a simple distribution: the cells are 
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independent. This allows us to use large deviation techniques. In contrast, 
in the optimal alignment the cells are correlated in a complex and poorly 
understood manner. 

A cell which has different number of zeros in its X-part and in its Y-paxt 
is called a nonzero cell. We say that an alignment v € Z fc has more than 1% 
nonzero cells if 

|{i€ [l,k]\vi ^0}| >0.01fc. 

Let V\% be the subset of V n consisting of the alignments which have at least 
1% of nonzero cells, that is, 

V±% := {v € V n \v has more than 1% nonzero cells}. 

Let 

V i% : = K - V 1% . 

5.4. The events. Recall that for a vector v we associate \v\ random cells 
C v (l), . . . ,C v (\v\) defined as a function of random i.i.d., Bernoulli random 

sequences X±,X2, ■ ■ ■ and Yi,Y2, In the following we define some events 

that capture the typical behavior of these random cells. 

Recall that N\ denotes the total number of ones in X and Y, Ni = 
YA=i(Xi + Yi). Let v be an admissible alignment, that is, v G V or, equiva- 
lently, ir v (\v\),v v (\v\) < n. 

Let iVi,, designate the number of ones up to the last cell of v: 

E x i+ E YA. 

3=1 3=1 ' 

Finally, we define the number of ones after the last cell 

n n 

Rv = *i+ E 

j =7T (\v\)+l j=v(\v\)+l 



Definition 5.2. 

• Let £4 designate the event that every optimal alignment belongs to the 
set V n . 

• Let D be the event that for all v G ^1%; at least 1% of the cells can be 
broken up. So, 

D:= p| D v , 

where D v is the event that at least 1% of the cells C v (l), . . . ,C v (\v\) can 
be broken up. 
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• Let F be the event that every v € V\% has at least 2a±% of ones in 
C„(l), . . . , C„(|t>|) on a side of less zeros. Hence, 

F:= H F v , 

v£V 1% 

where F v is the event that 

N~>2 ai N lv . 

• Let G be the event that every v S V\% has no more than 02% of matched 
ones. Hence 

G:= f) G t „ 

where is the event that 

2\v\ <a 2 N lv . 

• Let K be the event that there exists an optimal alignment v such that 

Rv < ^Vi« • 

In the next section, we shall prove that all the defined events hold with 
high probability. Note the importance of the breaking up notion. The events 
F and G together with the event K basically prove (4) and (5) for the 
case when the optimal alignment has at least 1% nonzero cells, that is, 
it belongs to V\%. But every optimal alignment needs not belong to V\%. 
However, the event D ensures that for every alignment from V^,, there 
exists another alignment v' € V\% with the same score. So, when the events 
£4 and D both hold, then there exists an optimal alignment in V\%. To this 
optimal alignment we can apply F, G and K and get the inequalities (4) 
and (5). These considerations lead to the next lemma, which is our main 
combinatorial lemma. Recall the definition of A n in Section 5.1. 

Lemma 5.1. 

(37) E A r\DC\FC\Gr\K C An. 

Proof. Recall that A n holds if there exists an optimal alignment, say 
w, such that the following conditions are fulfilled: 

(1) the proportion of ones with less zeros is above a\\ N~ > aiiVi; 

(2) the proportion of aligned ones is below c^: 2|io| < a 2 N\. 
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By the event K we know that there exists an optimal alignment v such that 
R v < N± v . When E± holds, then v is contained in the set V n . Assume that v 
contains less than 1% of cells with different number of zeros on their sides, 
that is, v E V^ % . Then, the event D ensures that we can break up v so that 
it gets more than 1% of nonzero cells and still remains optimal. Let that 
alignment be w. By doing the break up, the number of ones after the last cell 
remains unchanged, that is, R w = R v . Moreover, breaking up only increases 
the number of aligned ones, so N\ v < N\ w . Hence, there exists an optimal 
w G V\% such that R w < N± w . The events F applies to w. Hence 

(38) ^rrt-Nt^ 

Since w is admissible, N\ = N\ w + R w . Hence 

Ni = N lw + R w < 

N lw N lw ~ 
Using the last equality with (38) yields 

> a\ . 

N x ~ 

This is the first statement on the event A n . 

Since w £ V\% , the event G guarantees that there is a proportion of less 
than o;2% matched ones: 2|io| < 02^1^ < aiN\. This proves the second 
statement of the event A n . □ 

5.5. Proof of Theorem 2.2. From (37) it follows that 

(39) P{A c n ) < P{Ei) + P(D C ) + P{F C ) + P(G C ) + P(K C ). 

So, the proof of Theorem 2.2 is accomplished, if we show that there exists 
ai > a 2 > and e > such that the events P{Ef), P(D C ), P(F C ), P(G C ) 
and P(K C ) are exponentially small in n, provided e < Eq. In Lemma 7.9, we 
prove the existence of constants a± > and Cf, not depending on e, as well 
as a constants cjr(e) and e± > such that P(F C ) < Ci?exp[— cfu], if e < e\. 
In Lemma 7.10, we prove that for every < a 2 < a\, there exists e 2 < £1, 
depending on a 2 , such that for every e < e 2 , P{G C ) < Cc;exp[— can], where 
Cq and cq are some constants (possibly depending on e). In Lemmas 6.2, 
7.4 and 7.11, we prove the existence of £3 > 0, finite constants ce,cd,ck as 
well as Ce,Cd,Ck, possibly depending on e, such that 

P(D C ) < C D exp[-c D n], P(E C 4 ) < C E exp[-c E n], 

P(K c )<C K eM-CKn], 

provided e < £3. Thus, if £ < £0 := min{ei, £2, £3}, all the events P(E%), 
P(D C ), P{F C ), P(G C ), P(K C ) have exponentially small probabilities. 
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The proofs that D c , F c , G c and K c all have exponentially small proba- 
bility in n uses the representation of alignments as elements of V n . All these 
events state that a certain property holds for every alignment in V n . The 
proof that they have high probability goes as follows: for a given nonran- 
dom alignment v S V n , the cells are independent. Hence, one can use large 
deviation techniques. It then only remains to prove that the large deviation 
rate beats the number of elements in the set V n . 

6. Preliminary bounds. 

6.1. A useful approach. In the sequel, we are often going to use the 

following way of constructing random sequences X\ , Xi , . . . and Y\,Y2, 

Let £i,£2, • ■ • be a sequence of i.i.d. random variables with the distribution 
of £ being following: 

P(£ = 0) = l-e, P(£ = l)=e(l-e), 

P(H = n) = e n (l-e), .... 

The distribution of £j + 1 is geometric. The random variables £j stand for 
the number of l's between the O's: £i is the number of ones before the first 
0, £2 is the number of ones between the first and second and so on. For 
example, if (£1, £2, £3, £4, £s>£6) = (0, 2, 0, 0, 1, 0), then before the first 0, there 
are no ones; between first and second zero, there are 2 ones; between second 
an third zero, there are again no ones, and so on. Hence, the corresponding 
sequence X±, X2, ■ ■ ■ begins with 0, 1, 1, 0, 0, 0, 1, 0, 0, ... . Similarly, with the 

help of the random variables rjx , 772 , ■ . ■ , we construct the sequence Y\ , I2, 

Recall our task: we are given a fixed vector v = (v%, . . . , Vk) and we aim to 
construct the random cells (using i.i.d. random sequences X and Y) C v (i) 
as in Definition 4.1: at first we wait for the first time such that a pair of 
ones can be aligned so that the difference of zeros between X- and Y-part is 
u = v± [so we get C„(l)]; then we start afresh with u = V2 and so on. In terms 
of £ and 77 variables, it is relatively easy. Indeed, to get a 0-cell, we look for 
the smallest time i such that £j 7^ 0, ^ / 0. So, a 0-cell can be constructed 
using the stopping time T, where 

(40) T := min{i = 1, 2, . . . :£^ 0, + 0}. 

To get a — u cell (u > 0), we look for the smallest time T such that £i 7^ 
and rj u+ i 7^ 0. Hence, a — u-cell is constructed using the stopping time T, 
where 

(41) T := mm{i = 1, 2, ...:£,/ 0, r] u+i + 0}. 

In other words a cell with Vi = u can be viewed in the following way: we first 
set u zeros aside on side X if u > and on side Y otherwise. Then we align 
consecutive pairs of zeros, until we meet for the first time a pair of aligned 
zeros both directly followed by a one. Let us look at a numerical example: 
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Example. Take v 1 = u = 2. Let X = 000101 ... and F = 001 .... We put 
aside the first two zeros in X. From there, we align all the zeros until we 
meet two zeros both followed directly by a one. Here, this gives the cell 



X 1 1 
Y 1 



6.2. A bound on L n . A rough lower bound for the typical length of the 
LCS, is obtained as follows. 

1. First only align all the zeros you can. You get approximately a common 
subsequence of length (1 — e)n consisting only of zeros. 

2. Having aligned as many zeros as you could in 1, take the ones which 
can be aligned without disturbing the already aligned zeros. In terms of 
previous subsection, it gives (approximatively) additional 

(l-e)n 
i=l 

ones, since between i — 1st and ith pair of aligned zeros, there are £j 
ones in X and r\i ones in Y. The random variables ^ + 1 and iji + 1 
are Geometrically distributed with parameter (1 — e). This means that 
min{^,?7i} + 1 ~ G(l - e 2 ), so 

E min{&, r/J = 1 „ - 1. 

1 — e z 

So, in average the ones contribute {jz^i — 1)(1 — s)n. 
In the way described above we get a common subsequence of length about 

n 



(42) 



l)(l- e ) + (l- e ) 



1-e 



2 



n 



e 2 
1 + e 



1 + e' 



To stay on the safe side, we bound L n by a quantity that is little smaller 
than (42); we take [(1 - e) + 0.9e 2 ]n. 

Let E denote the event that the LCS is longer than ((1 — e) + 0.9e 2 )ra, 
that is, 

E := {L n >((l-e) + 0.9e 2 )n}. 



Lemma 6.1. There exists £3 > such that for every e < £3 

P(E) > l-5e" an , 

where a(e) > 0. 
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E% 



i=i 



ne 



< den >, 



i=l 



ne 



< 5en 



When E% holds, then X\, . . . ,X n has at least (1 — (1 + 5)e)n zeros. On E\, 
the same holds for Yy, . . . ,Y n . Let 

E 2 (5) :=E% n£f. 

When Ei holds, then at least (1 — (1 + 5)e)n zeros can be aligned. 

Let Ci '■= minted?}) where £j, rji are i.i.d. random variables, & + 1 ~ 
G(l — e). So, C« + 1 ~ G(l — e 2 ). From Proposition A.l (see the Appendix), 
it follows that for every a < 1, 



(43) 



WEg 



< 



O: 



m — m 



< e -C(a)m 



where C(a) = a — 1 — In a. Let E\ be the event that Y^LiCi IS &t least 
jz^t — m, where m(S) := (1 — (1 + 5)e)n and a < 1. So 



n 



m 



When £Jx an d E 2 both hold, then one can align m zeros and — m 

ones between them. This means that on E\ CiE 2 , the length of the longest 
common subsequence has the following lower bound 



ma 



a 



a[ 1 — e + 



l-e(l + S) 
1-e 2 

5e 



e 



1 + e 1 - 



n = a 



n. 



1 



Se 



1 + e 



Let us compare the right-hand side of the previous inequality with ((1 — e) + 
0.9e 2 )n. Since 

_2 



(44) 



a 1 — e + 



1+e 1 



1 -e) -0.9e" 



(l-e)(a- l) + e 5 



1 + e 



0.9 



we see that for a = 1 — e 3 and 5 = e 2 , (44) is positive, provided e is small 
enough. So, if a = 1 — e 3 and 5 = e 2 , there exists £3 > such that for every 
e < £3, Eif]E 2 CE, implying that 

P(£ c ) < P(£f (e 2 )) + P{E{{t ~ e 3 ,e 2 )). 
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Finally, let us bound the probabilities. For any 5 > 0, from Hoffding's in- 
equality, it follows that 

P((E% ) c ) < 2exp[-2(fe) 2 n], P((E%) C ) < 2exp[-2(fe) 2 n]. 
From (43), we get that 

P(Ef(a,S))<exp[-C(a)m(S)]=exp[-C(a)(l-(l + 8)£)n]. 
Take a = 1 — e 3 and 5 = e 2 to obtain that, for every £ < £3 

P{E C ) < 4exp[-2e 6 n] + exp[-C(l - £ 3 )(1 - (1 + e 2 )e)n] < 5e~ an , 
where o(e) = min{2e 6 , C(l - £ 3 )(1 - (1 + £ 2 )e)}. □ 

Note that Lemma 6.1 gives a lower bound for the Chvatal-Sankoff con- 
stant: j^. 

If < 6 < 0.8e, then on E 2 

(45) iV x '<n[(l-e)+0.8e 2 ], N$ < n[{\ - e) + 0.8e 2 ], 

where A^q and -/Vq are the number of zeros in X and Y, respectively. In this 
case, hence, 

(46) ^< n[ (i_ e ) +0 .8 e 2 ], 

where No is the number of zeros in X and Y. On the other hand, if E holds, 
then 

(47) L„>n[(l-e) + 0.9e 2 ]. 
So, if < 5 < 0.8 and E 2 DE holds, then 

(48) ^ + (0 .l) e 2 n<L„. 

As explained in subsection 5.3, (48) implies (34), that is, Yli=i \ v i\ — We 
also showed that (48) implies that in any optimal alignment there are at 
least (0.1)e 2 n pairs of aligned ones. Thus, if < 5 < 0.8 and E 2 n E hold, 
then any optimal alignment must belong to V n , where the set of alignments 
V n has been defined in (35). Recall that E4 designates the event that every 
optimal alignment belongs to V n . 

Lemma 6.2. There exists £3 > such that for e < £3, it holds, 
P(E 4 ) > 1 - 5exp[-an] - 4exp[-2(0.8£) 2 en]. 

Proof. We saw that £2(0. 8e) DEC E4. Proposition 4.1 now finishes 
the proof. □ 
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7. Bounding the probabilities. 

7.1. Combinatorics. In the following, we use C k as the number of com- 
binations, that is, 

r k _ n ' 
n ~ k\{n-k)V 

We also make use of the following fact: the number of m-dimensional vectors 
with nonnegative entries summing up to exactly n is C^V^_ 1 . To see this, 
use the induction by m: for m = 2, it trivially holds. Suppose that the for- 
mula folds for m. Consider the m + 1-dimensional vectors with nonnegative 
entries summing up exactly n. The first m components determine the vec- 
tors; since the first m components can sum up to 0, . . . , n, by the assumption 
the number of m + 1-dimensional vectors with nonnegative entries summing 
up exactly n is 

V^) u m-l u l+m-l Ly 2+m-l "I r <^ n+m _i- 

Using the fact that + C^_ m _ 1 = C^ +m , it is easy to show that (49) 

equals to C™ +m . 

Lemma 7.1. For k>\, we have 
(50) \V{k)\<2 k Cl k <lQ k . 



Proof. Let 

V+(k) ={(«!,...,«*)€ (Z+) fc : V! + ■ ■ ■ + v k < 2k}, 

where Z + = {0, 1, . . .}. Thus, |V + (£;)| is the number of /c-dimensional vectors 
with nonnegative integer entries and summing up to at most 2k. By adding 
one more component, we get that |V + (/c)| is equal to the number of k + 1- 
dimensional vectors with nonnegative integer entries and summing up to 
exactly 2k. The number of such vectors is C , 2^^ 1 _ 1 = C| fc . It follows that 

\V + (k)\ = Cl k <2 3k . 

(Here 2 3fc represents the number of subsets of a set of size 3 A:. Of course 
this upper bound is far from being optimal, but it is still sufficient for our 
purpose.) For every fc-dimensional vector, there are at most 2 k ways to assign 
the signs of the entries. This then yields 

\V(k)\ <2 fc C| fc <2 4fc = 16 fc . □ 

Let 



I(ui, . . . ,v k ) = \{i G {1, . . . ,k} :vi ^ 0}|. 
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Lemma 7.2. 

(51) \Vi%(k)\ <exp[0.1262fc], 

where 

Vi%(k) := V(k) n{( Vl ,...,v k )eZ k : I(v x , ...,v k )< 0.01 A:}. 

Proof. Without loss of generality assume that 0.01A; is an integer. Con- 
sider the set of 0.01/c-dimensional vectors with nonnegative integer entries 
and summing up to at most 2k. Let this set be 

r O.Olfc N 

W + (k) :=^(w 1 ,...,w .oik)eZ +0mk : ^ Wi <2k\. 

We know that 

f co\ \W+(1A\ — r<0.01fe+l-l _ ^O.Olfc _ ^0.01/2. 01(2.01)fe 

l 0/ J I VV \ K )\ — °2fc+0.01fc+l-l _ ( - y 2.01fc — °2.01fc 

In order to bound the number of combinations C^ 1 , where q S (0, 1) (and 
qm is an integer), we note that 

(53) C?^ < q- qm (l - q)-™^ , 

that follows from the fact that C™ q ■ q qm {l - q) m ^- q ) < 1. Using (53) with 
m = 2.01ife and q = (52) yields 

Here are 2°' 01fc ways to assign the signs, so 

\W{k)\ < 2 ' 01fc (201)°' 01A; (1.005) 2A; = (402) a01fe (1.005) 2fc , 

where 

C O.Olfc 
W(k):=\ (w 1 ,...,w omk )eZ omk : 



O.Olfc N 

Kl < 2k \- 
i=l J 



Obviously, 

\vr % (k)\<c° k - 01k \w(k)\. 

With (53), we have 



c o.oifc< (100) o.oifc^ 



0.99fc 



implying that 

\V{ % (k)\ < (40,200r ife (1.005) 2 ^— J 

< 1.1345 fc <exp[0.1262fe]. □ 
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7.2. The event D. Recall that D v denotes the event that 1% of the 
cells of v can be broken up. Note that the following bound holds for every 
£6(0, \]. 

Lemma 7.3. Let v e V° % (k) . Then 
(54) P(£>Jj)<exp[-0.16fc]. 

Proof. Let us calculate the probability that a 0-cell is breakable. We 
use the approach introduced in Section 6.1. Recall the definition of T in (40). 
With this construction, being breakable means the existence of 1 < i < T 
such that 

6^0, rn = 0, 6+1=0, rn^O 

or 

6 = 0, th^O, 6+1^0, th = 0. 

Let 

U\ : = min{i = 2, . . . :^-i + 0,Vi-i = 0,6 = 0,7?i / 0}, 
17 2 : = mm{i = 2,...: 6_i = 0, r^i ^ 0, & ^ 0, ^ = 0}, 
U: = U!AU 2 . 

Let 

{0,1,2,...}, ^+:={1,2,...}. 

With those stopping times, the probability that a cell is breakable is P(U < 
T). Let us estimate it (from below). An easy way is to consider the disjoint 
pairs of indexes (1, 2), (3, 4), ... , (2j — 1, 2j), . . . and restrict the stopping time 
U to the even integers only. So, we define the independent random vectors 

z j = (&j-i,r)2j-i,&j,mj), J = 1,2,..., 
U[ : = min{j = 1,2,...: £ 2 j-l / 0, ??2j_i = 0, f 2 j = 0, r] 2 j °} 

= min{j = 1,2,... ■ Zj £ Ai}, 
17a : = min{i = 1,2,...: £y-i = 0, 772^-1 / 0, £ 2j / 0, rj 2j = 0} 

= min{j = 1,2,... ■ Zj £ A 2 }, 
U' :=U[AU2= mm{j = 1, 2, . . . : Zj £ A 2 U A}, 
r , :={j = l J 2,...:Z j €fliU5 3 }, 

where 

Ai := X + x {0} x {0} x X + , A 2 : = {0} x X + x ^+ x {0}, 
Si = AT+ x X+ x AT x X, B 2 = X x X x X + x X + . 
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Clearly, 

2U' - 1 > U, 2T'-1< T, 

P(U <T)> P(2U' - 1< T) > P(2U' - 1< 2T' - 1) = P(U' < T 1 ). 

Since the random variables Zj are independent, the probability of the right- 
hand side is easy to calculate: 

pm/ <T / ) _ P(Z 1 eA 2 UA 1 ) 2e\l-ef 



P{Z X eA 2 \J A x ) + P{Z X £B 2 U B{) 2e 2 {l - e) 2 + 2e 2 - e 4 

2(1 -e) 2 
~ 2(1 -e) 2 + 2 -e 2 ' 

It is easy to check that the function 

2(1 -e? 



lie) :-- 



2(l-e) 2 + 2-e 2 
is decreasing in [0, which implies 

2(l/2) 2 2 



9(e) > 



2(l/2) 2 + 2- (1/2) 2 9' 



Let f = (vi, . . . , Vk) € This means that the number of zero cells m is at 
least 0.99fc. Let J be the index set of zero cells and let for every j G J, Ij 
be the Bernoulli variable that is one if and only if the cell Vj is breakable. 
Clearly, the random variables Ij are and for every e > 0, p(e) := P(Ij = 1) > 

?00 >§■ 

In the following, we use the following result: let Z be a binomial random 
variable with parameters p and m. Let < a < p. Then 

/ \ am /i _ \ (l-a)m /r)\ am 

(55) P(Z<am)< ^ < (^) exp[(a-p)m] 



(see, e.g., [8], page 130). Using (55) and the facts that p:=p(e) > | as well 
as m > 0.99/c, we get 

P(D C V ) =P[J2 I 3 < °- 01m ) < (100p)°' 01m exp[(0.01 -p)m] 

< exp[(0.01 In 100 + (0.01 - p))m] 

<exp[(0.047 + (0.01 -p))0.99fc] < eaq>[-0.16Jb]. □ 

Lemma 7.4. There exists Cd < oo such that 
(56) P(D C ) < C D exp[-0.0438(0.1e 2 )n]. 
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Proof. 

D{k) ■= n d v . 

vev° % (k) 

With (51) and (54), we get 

P(D c (k)) < Y, P ( D v) ^ exp[(-1.16 + 0.1262)Jfc] = exp[-0.0438n]. 

Since k > (0.1e 2 )n, we find: 

P(D C )< P(D c (k))< exp[-0.0438fc] 

fc>(0.1e 2 )n fc>(0.1e 2 )n 

= C D exp[-0.0438(0.1e 2 )n], 

where 

C D ■= (1 -exp [-0.0438] H 1 . □ 

7.3. The event F. The following large deviation result is proven in the 
Appendix. 

Lemma 7.5 (Large deviation for geometric random variables). Let G\ , . . . . 
G m be i.i.d. random variables with geometric distribution G(p). There exists 
< ao < 1, not depending on p, such that for every a < ao, the inequality 

(m \ 
V Gi < -ml < exp[-300ml Vm > 1 
T^i p / 

holds. Moreover, for every C > there exists 1 < Aq(C) < oo, such that for 
every A > Aq 

( m A \ 

(58) P [Yl G i > - m J <exp[-Cm] Vm > 1. 

Recall the definition of the event F: Vv G V\%, N~ < 2a\N\ v . 

Let u be a nonnegative integer. Let us consider an (— u)-cell. Recall the 
random variables £j and r\i as in Section 6.1 and recall the random variable 
T as in (41), which is the smallest time T such that £j 7^ and r/ M+ j 7^ 0. Let 
^x(i) be the index of jth £j such that & 7^ 0. So 

21,(1) =min{i> 1:&^0}, 

T x (j + l)=min{i>T,(j):&/0}. 

Let 

(59) p~ := min{j = 1,2,...: Vu+T x (j) 7^ 0}. 
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Hence p is the number of £j's in the cell that are not (including the one 
that is aligned). With this notation, 

T = T x {p~). 

For a (— tt)-cell, the number of 0-s in X is smaller then the number of O's in 
Y. Let us estimate (from below) the number of l's inside the X-side, iV-j" . 
This number does not count the aligned one, so the number is clearly at 
least p~ — 1, that is, iV-f > p~ — 1, where the equality holds if and only if 

& x (j) = 1 , J = 1,-..,P" - I- 

The random variable p~ has geometric distribution with parameter e. In- 
deed, since X and Y are independent, from the right-hand side of (59) 
follows 

P(p~ = n) = P(Vu+T x (l) = 0, • • • ,rj u+Tx (n-l) = 0,Vu+T x (n) + 0) 

= (1 - e) n - x e. 

Let v = . . . , Vk)- Recall that N~ is the number of ones on the sides with 
fewer O's of nonzero cells. At first, we give a lower bound on . 

Lemma 7.6. There exists a 7 > 0, not depending on e and £\ > such 
that for every v = (vi, . . . , 1%) G V\% we have 

(60) P{Fiv) < exp[-3fc] where F 1v = (n~ >-^k\, 
provided e < E\ . 

Proof. Let v = (v\, . . . ,Vk) £ V\%. Let / be the index set of nonzero 
cells, |/| > O.Olfc. Let us estimate (below) the number of l's in the side of 
fewer O's: 

M 
i=i 

For a cell vi / 0, we have that N~(i) > pj — 1, where p^ , i £ I are geomet- 
rically distributed random variables with parameter e as in (59). So, 

(61) N~>J2Pi-\I\- 

id 

Let a a be as in Lemma 7.5. It does not depend on e. Let 
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Without loss of generality, we may assume that the v±, .. . ,v\i\ are nonzero. 
Thus, if £ < £i, then IOO7 + e < a , and from (57) we obtain: 



TO<f(E(ft-i)<^) 
sWx>r-i)<7 fc l 



' m 



' m 



p - — ^ — m 



' m 



< 



< exp[-300(0.01)£;] 

= exp[— 3k]. □ 



Let 



Fi(fc):= f| Fi„ and Fi := f| Fi(fc). 

«£%n^(fc) fc>(0.1e 2 )n 

By (50), (60) and Lemma 7.6, it holds: if e < £1, then 
-P(Fi(A;) c ) < ^ < 16 fc exp[-3A:] = exp[(lnl6 - 3)fc] < exp[-0.^ 



Hence 

(62) 
where 



veV(k) 

fe>(0.1e 2 )n fc>(0.l£ 2 )n 

= Ci iF exp[-0.2(0.1e 2 )n], 



^-(l-expt-O.^)- 1 . 

Let v = (v±, . . . , vt) € V(k) be given. Let C„(l), . . . , C v (k) be the correspond- 
ing cells. Let pj, 1 < j < A; be the number of nonzero £j's in the cell C v (j). 
Clearly pi,...,p/. are independent. The distribution of pj is geometric with 
parameter e, if < 0. Otherwise, there exists a geometric random vari- 
able with parameter e, say pj such that pj < pj < pj + Since v £ V(fe), 

Z)j l^jl ^ Let us estimate from above the quantity p„ := J2j=i Pj- 
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Lemma 7.7. There exist a constant B not depending on e such that for 
every v = (v±, . . . , v^) S V n we have 

P{F 2v ) < exp[-(lnl6 + l)k], where F 2v := < —k\. 

PROOF. Let B be such that B — 1 > A (lnl6 + 1), let v = (vi, . . . ,Vk) G 
V(k). By (58), 

B. x 



< p f E Pi+ E p7 + 2 ^7 fc ) 

\7:«,-<0 J':«i >0 / 



E pj+ E ^^*) 
<p( E Pi+ E 

\j':iy<0 j':i>j>0 / 

< exp[- (In 16 + !)&]. □ 



Let 



F 2 (k):= f) |p„ <-£;}, F 2 := f| F 2 (fc). 

u€V(fc) 6 J fc>(0.1e 2 )n 

Then, similarly to (62), 

P((F 2 (k)) c )< ]T P(i^) <exp[-*((lnl6 + l)- In 16)] =exp[-*], 

vev(k) 

P(F 2 c )<C 2F exp[-0.1e 2 n], 

where 

^-(l-exphl])- 1 . 

Next, using Lemma 7.7, we estimate from above the random number of 
ones in the X-side of the cells C v (l), . . . ,C v (\v\). 
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Lemma 7.8. There exists a constant A < oo, independent of e, such that 
for every v = (v±, . . . , v^) £ V n , we have 

r {k) Ak \ 

P \J2 X J> 7(1^7)) ^ 2exp[-(lnl6 + l)k). 

PROOF. Let v = (v\, . . . , Vk) e V{k). Note that 

P{Si = k% + 0) = ^(l - e), k = l,2,.... 
The number of 1' s on the X-side of the cell C v (j) is 



(63) 



E 

i=l 



where G% are geometrically distributed r.v-s with parameter 1 — e indepen- 
dent of p(j). Here p(J) — 1 is the number of £'s inside the cell C v (j) and the 
additional one is the one that is matched. Hence, 



Pv 



(64) 



7r(fe) p v —k 
j=l i=l i=l 



Let B be as in the previous lemma and let A be large enough so that 

(In 16 + 1) 



ll >A ° 



B 



and define 



■B/ek 

E G< 

. i=l 



A 



e(l -e) 



k . 



From Lemma 7.5 with m = ^k 

( B,£k Ak 



\ i=l 

III 



s) 



(lnl6 + l)e 



< exp 



(In 16 + 1) 



B 



-m 



< exp 



B 



-m 



exp[-(lnl6 + l)k). 



Due to (64), 



F 2 , v n f 3j1) c E^< -n 7 r =: 
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Lemma 7.7 finishes the proof. □ 
Let 

F A (k):= P| F Av , F A := f| F 4 {k). 

veV(k)nV 1% k>{0.1e 2 )n 

Then, just like in (62), 

(65) P(F2(k)) < 2exp[-fc((ml6 + 1) - In 16)] = 2exp[-fc], 

(66) P(F 4 c )<2C 2F exp[-0.1(e 2 )n]. 



Lemma 7.9. There exists a± > 0, independent of e, as well as a constant 
Cp < oo such that 

P{F C ) <C F exp[-0.02e 2 n], 
provided e <E\, where E\ is as in Lemma 7.6. 

PROOF. Let e < E\ and v G Vi%. We have 

ir(\v\) \ 



So, 



X 



and by (62) and (66) 

P{F£) < P{F1) + P{FI) < C F iexp[-0.02e 2 n] +2C F2 exp[-0.1e 2 n]. 
By symmetry, P{Fy) < Cf\ exp[— 0.02e 2 n] + 2Cf2 exp[— 0.1e 2 n], where 



3=1 



Thus 

/7t(|d|) u(\v\) 

nil 

A 



F x nF y c\2N->^—fil[ ]T x i + E Y A \c{N~>2 ai N lv } = F, 
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where 

(67) ai:= 8l^^^' 
provided e < 0.5 and 

P(F C ) < 2C F1 exp[-0.02e 2 n] + 4C F2 exp[-0.1e 2 n] 

< (2C>i + 4C F2 ) exp[-0.02e 2 n]. □ 

7.4. T/ie event G. We use the notation introduced in the previous sub- 
section. Let a\ be as in (67). Fix < a 2 < a i- 

Lemma 7.10. There exists a constant Cg < oo and £2(^2) > such that 
for every e < e 2 

P{G C ) < C G exp[-(300 - lnl6)(0.1)e 2 n]. 
PROOF. Let v £ V{k). From (64) 

7r(fc) p v -k k k 

j=l j=l i=l i=l 



Let 



t 7T(k) \ 



Then 



k 7 \ / k 

k \ e 1 



P(Gt)<P Ep7<- = p [Y,pJ <--*)■ 



<i=1 a 2 J a 2 e 



Let a Q be as in Lemma 7.5. Let e 2 := a 2 a Q . Note that a 2 < 0.5, so e 2 is 
smaller than £1 defined in Lemma 7.6. Recall that are i.i.d. random 
variables with G(e) distribution. Then, by (57), for every e < e 2 , 



P(G°)<exp[-3Q0k]. 



Let 



G(k):= f) G v , 

vev[k) 



n G(k) = n g„ c n h < « 2 e x 4 = : g - 

fc>0.1e 2 vev n vev 1% I j'=l J 
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There exists a constant 0.5Cg such that, for e < £2, 
P(G c v (k)) < exp[-(300-lnl6)fc], 

P{G%) < 0.5C G exp[- (300 - In 16) (0. le 2 )n] . 
Similarly P(G c y ) < 0.5C G exp[-(300 - In 16)(0.1e 2 )?i], where 

u(\v\) 



G y := fl M<« 2 £^ • 

V&V,V„ I 7 = 1 J 



vev 1% 

Since G := G x nG y , we have that 

^(i 7 ") < C G exp[-(300-lnl6)(0.1e 2 )n], 
provided e < £2 • Q 

7.5. T/ie event X. 

Lemma 7.11. Assume e < £3, where £3 is as m Lemma 6.2. There exists 
a constant Ck such that 

P(K c )<C K exp[-c K n], 
where ck > is a constant, depending on e. 

Proof. Let v be an optimal alignment of X and Y and denote by R v 
the number of ones after the last cell: 

n n 

R v := ]T X,+ Y i- 

t=7r(|«|)+l i=v(\v\)+l 

Let 

/3 := (O.l)er 2 . 

We shall show that with high probability, there exists an optimal alignment 
such that all the ones after the last cell are contained in the interval [n — 
(3n + l,n] (without loss of generality assume that j3n is an integer). In other 
words, we shall prove that the following event has big probability: 

(68) K x :={3veV*:Ti(\v\) >n-(3n, u(\v\) > n - /3n}, 

where V* is the set of optimal alignments. 

Suppose K\ holds and let v G V* be such that 7r(|u|) >n — (5n and v(\v |) > 
n — fin. Then the number of ones after the last cell of v is clearly at most 
2f3n, since there are at most 2(3n symbols after the last cell. Thus, R v < 2f3n. 
Recall that 

t(I*I) Kkl) 

1=1 i=l 
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Obviously iVi,, > 2\v\ and if E4 holds, then every optimal v satisfies \v\ > 
(0.1)e 2 n. Hence, if K\ n E4 holds, then there exists an v £V* such that 

R v < Wn = 2(0.1)e 2 < 2\v\ < N lv , 

implying that 

P{K c )<P{Kt) + P{Et). 
It remains to show that Kf has exponentially small probability in n. Define 

K 2 :=hs,t>n-^n:s+ Y *i=t+ Y Y ^ Y t = X s = l\. 
I i=s+l i=t+l ) 

Recall that after the last cell, only the zeros can be aligned. If, in an interval 
one aligns only the zeros, it can be done in the following manner. Start from 
the last pair of zeros and align them. Then, disregarding all the ones, take 
the second last pair of zeros (i.e., the second last zero in X and the second 
last zero in Y) and align them. Then, align the third last pair of zeros and 
so on. Doing so, the maximum number of zero-pairs (in the given interval) 
can be obtained. If the event K2 holds, then in the interval [n — + 1, n], 
the described way of aligning zeros allows to align a pair of ones without 
disturbing the alignment of zeros. This violates the optimality, hence we 
immediately have the following implication: if K2 holds, then for any optimal 
alignment v either vr(|t>|) >n — or ^(|i>|) >n — ^n. Unfortunately, this is 
not enough, so we define two more events: 



n-2/3f3n } f n n 

X,. > 

3' 



£ x s >i\n\lp n - Y x s>\^ 



Ks=n-/3n+l ) K s=n-2/3f3n+l 

(- n-2/3/3n 1 f O n 

Ki-. = \ Y Y a >l n \pn- Y Y.>& 

U=n-/3n+l J I s=n-2/3f3n+l 

The event JTf states that among 

X n -/3n+l, ■ ■ ■ i X n _ 2 /3/3n 

there is at least one and, at the same time, among 

X n -2/3/3n+li ■ ■ ■ >X n 

there are at least |/3n zeros. The event K\ is symmetric. 

Suppose iff holds. Let v be an optimal alignment such that ^(|i>|) > 
n — and vr(|u|) < n — (3n. Then there exists another optimal alignment v' 
such that ^(l^'l) = z^(H) and ir(\v '|) > n — (3n. Indeed, since z^(H) > n — ^n, 
the number of aligned O's after last cell is at most [the maximal number 



42 



J. LEMBER AND H. MATZINGER 



of O's on Y-side after z^(H)]. By iff, we can align all those O's from the 
Y-side with the zeros on X-side that lie on ^ n _2/3/3n+i) • • • ,X n . After such 
a realignment, the situation is the following: as previously, ^(id) is aligned 
with X^n v \y However, all the zeros after Y u (\ v \) ( on Y-side) are aligned with 
the zeros after X n _ 2 /^p n+ i ( on -X-side). The score remains optimal. By , 
again, among X n _p n+ i, . . . , X n _ 2 /3p n , there is at least one 1. Thus, without 
changing the score, we can align Y v (\ v \) with this 1. Since the location of this 
1 is at least X n _g n+ i, we now have a new optimal alignment v' such that 
v(\v'\) = v(\v\) and n(\v'\) > n - (5n. 
We have proven that 

It remains to prove that K 2 , K$ and K\ hold with big probability. 
Clearly, 

/ n-2/3/3n \ ( n 1 

P(KT)<P[ E X 8 = o\+P[ E X a >-Pn 

\s=n-/3n+l / \s=n-2/3l3n+l 

The first probability of the right-hand side equals to exp[ln(l — e)^j3n]; 
by Hoffding's inequality, the second probability is bounded by exp[— |(i — 
e) 2 (3n]. Thus, 

P((K$ n Klf) < 2exp[ln(l - e)±/?n] + 2exp[-|(i - e) 2 /3n]. 

Finally, let us bound P(K 2 ). The event K 2 essentially states that among 
i.i.d. Bernoulli B(l,e) random variables 

Xi, . . . , Xp/ 3n , ¥]_,..., Yp/zn 

after aligning all O's, one can align an additional pair of ones. In terms of 
£i's and r/'s, K 2 = K 2 n K 2 , where 




Recall that T is the stopping time that shows the first time a pair of ones 
between the O's occurs. Then T — 1 is the number of O's before the first align- 
ment of ones and J2f=i £j 1S the number of Xj's before the first alignment 
of ones. If their sum is smaller than ^n, then the X-part of the aligned pair 
occurs before ^n. The event K 2 is analogous. 

To bound the events K 2 and K 2 , we use Lemma 7.5. Let Aq(1) be as in 
Lemma 7.5 and define 

_(3 (l-e) 
' 3 4,(1) 3' 
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Clearly 



P(K$ c ) < P(T > Sn) + P (j^ ii + Sn > ^nj . 



Since T is a geometric random variable with parameter e 2 , P(T > Sn) < 
exp[ln(l — e 2 )5n]. Since Gi := £j + 1 are geometric random variables with 
parameter (1 — e), by (58), we have 

P[hi + Sn> I n) = P (j: Gi > I r») = P <* > £§*n) 
< exp[— Sn]. 

To sum up: 

P(K C ) < P(Kl c ) + P(Kf ) + P{Kf) + P(Jff ) + P{Et) 
< 2exp[ln(l -e)\(5n] + 2exp[-§(± - ef f3n] 
+ 2exp[ln(l - e 2 )5n] + 2exp[-on] 
+ 5exp[-an] + 4exp[-2(0.8e) 2 en], 
since e < £3, so P(E%) < 5exp[— an] + 4exp[— 2(0.8e) 2 era] by Lemma 6.2. □ 

APPENDIX 

Proposition A.l. Let G±,..., G m be i.i.d. geometrically distributed ran- 
dom variables with parameter p. Then for every A > 1 and a < 1, there exists 
C(A) := A — 1 — log A and C(a) := a — 1 — log a such that such that 

( m A \ 
(69) P V Gi > -m < exp[-C(A)m], 



(70) P\YGi<-m <exp[-C(a)ml. 

YS P J 

Proof. Let us recall (55). Let A > 1, n = ~m and a = < p. From 
(55), we get 



.4 



/A/pm 



an 



< ( - 1 exp[(a - p)n] = A m exp[(l - A)m] 
= exp[(ln^- (1- A))m], 
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where Y{ are i.i.d. Bernoulli random variables with parameter p. This finishes 
the proof of (69). 

If p > a, then (70) trivially holds. If p = a, then the probability in (70) 
equals p m = exp[(ln a)m] = exp[— ln^m]. Hence, we consider only the case 
p < a < 1. Prom (55), it easily follows: let Xi ~ B(l,p). Then, with 1 > a > p, 



(71) P[ J2 X i >na J <exp 

We have that 



u=l 



aln( - ) + (a — p) )n 



a 



/a/pm 



(72) P[Y, G i^- m 



\i=l 
With n := % 



/a/pm > 



.7 = 1 



.7 = 1 



/// 



and a := ^, the inequality (71) states 



u=l 



P 1 

— in a + 

a 



p(l-a) 



a 



n 



exp[(lna + 1 — a)m\. 



□ 



Proof of lemma 7.5. The right-hand side of (70) is smaller than 
exp[— 300], provided 

a < exp[— 301] =: a a . 

That proves (57). To get (58), note that for every C > 0, it is possible to 
choose A so big that \nA-(l — A)< -C. □ 
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